Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[external-assets] Create external assets defs for undefined assets in repo construction #20145

Merged

Conversation

smackesey
Copy link
Collaborator

@smackesey smackesey commented Feb 29, 2024

Summary & Motivation

Currently it is possible for assets to be referenced as dependencies when they have no corresponding AssetsDefinition (only when using deps). This complicates implementation of the AssetGraph and AssetLayer.

This PR generates unexecutable external assets during repository construction (the same place source assets are converted to external assets) for any referenced keys without a corresponding user-provided definition.

How I Tested These Changes

Existing test suite. One test was removed-- this made sure the internal asset graph could handle undefined asset references, but we no longer rely on this behavior.

@smackesey smackesey force-pushed the sean/external-assets-repo-create-assets-for-undefined branch 2 times, most recently from 212504b to 3efaa6f Compare February 29, 2024 17:03
@smackesey smackesey force-pushed the sean/external-assets-execution-unit branch from 59f3d5f to 1fd1ee8 Compare February 29, 2024 17:21
@smackesey smackesey force-pushed the sean/external-assets-repo-create-assets-for-undefined branch from 3efaa6f to 5422a3b Compare February 29, 2024 17:21
@smackesey smackesey force-pushed the sean/external-assets-execution-unit branch from 1fd1ee8 to b6ea604 Compare February 29, 2024 17:45
@smackesey smackesey force-pushed the sean/external-assets-repo-create-assets-for-undefined branch 2 times, most recently from dce4ba8 to d77ea65 Compare February 29, 2024 18:16
@smackesey smackesey changed the base branch from sean/external-assets-execution-unit to sean/external-assets-hoist-input-key-resolution-to-repo-builder March 1, 2024 16:36
@smackesey smackesey force-pushed the sean/external-assets-repo-create-assets-for-undefined branch from d77ea65 to 101a9c6 Compare March 1, 2024 16:36
@smackesey smackesey marked this pull request as ready for review March 1, 2024 17:59
@smackesey smackesey force-pushed the sean/external-assets-hoist-input-key-resolution-to-repo-builder branch from 00eb64c to 2f03e42 Compare March 1, 2024 18:16
@smackesey smackesey force-pushed the sean/external-assets-repo-create-assets-for-undefined branch from 101a9c6 to 599d04b Compare March 1, 2024 18:16
@smackesey smackesey force-pushed the sean/external-assets-hoist-input-key-resolution-to-repo-builder branch from 2f03e42 to 7fdcaea Compare March 1, 2024 18:47
@smackesey smackesey force-pushed the sean/external-assets-repo-create-assets-for-undefined branch from 599d04b to 311d0ef Compare March 1, 2024 18:47
@smackesey smackesey force-pushed the sean/external-assets-hoist-input-key-resolution-to-repo-builder branch from 7fdcaea to ae90738 Compare March 1, 2024 19:31
@smackesey smackesey force-pushed the sean/external-assets-repo-create-assets-for-undefined branch from 311d0ef to 3bb4896 Compare March 1, 2024 19:31
@smackesey smackesey force-pushed the sean/external-assets-hoist-input-key-resolution-to-repo-builder branch from ae90738 to 5003390 Compare March 2, 2024 20:43
@smackesey smackesey force-pushed the sean/external-assets-repo-create-assets-for-undefined branch from 3bb4896 to d73e923 Compare March 2, 2024 20:43
@smackesey smackesey force-pushed the sean/external-assets-repo-create-assets-for-undefined branch from d73e923 to c015d8f Compare March 3, 2024 21:40
@smackesey smackesey force-pushed the sean/external-assets-hoist-input-key-resolution-to-repo-builder branch from 072ed7a to c2031ac Compare March 3, 2024 22:13
@smackesey smackesey force-pushed the sean/external-assets-repo-create-assets-for-undefined branch from c015d8f to 0b5aec4 Compare March 3, 2024 22:13
@smackesey smackesey force-pushed the sean/external-assets-hoist-input-key-resolution-to-repo-builder branch from c2031ac to c753e70 Compare March 4, 2024 16:56
@smackesey smackesey force-pushed the sean/external-assets-repo-create-assets-for-undefined branch from 0b5aec4 to cdd3edc Compare March 4, 2024 16:56
@smackesey smackesey force-pushed the sean/external-assets-hoist-input-key-resolution-to-repo-builder branch from c753e70 to 9fa2ff0 Compare March 4, 2024 18:23
@smackesey smackesey force-pushed the sean/external-assets-repo-create-assets-for-undefined branch from cdd3edc to 94444b2 Compare March 4, 2024 18:23
@smackesey smackesey force-pushed the sean/external-assets-hoist-input-key-resolution-to-repo-builder branch from 9fa2ff0 to 42c75ff Compare March 4, 2024 20:29
@smackesey smackesey force-pushed the sean/external-assets-repo-create-assets-for-undefined branch from 94444b2 to 98c621b Compare March 4, 2024 20:29
@schrockn
Copy link
Member

schrockn commented Mar 4, 2024

Does this cause any user-facing changes? Do we render bare asset key references differently?

Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Req'ing changes for q mgmt

@smackesey smackesey force-pushed the sean/external-assets-hoist-input-key-resolution-to-repo-builder branch from 42c75ff to 5401f3b Compare March 4, 2024 21:07
@smackesey
Copy link
Collaborator Author

Does this cause any user-facing changes? Do we render bare asset key references differently?

I didn't realize this but it turns out that we do. Given these defs with foo_source "bare" and bar_source a defined SourceAsset:

### 240304-stub_vs_source.py

from dagster import (
    asset,
    Definitions,
    AssetExecutionContext,
    SourceAsset,
)



@asset(deps=["foo_source"])
def foo(context: AssetExecutionContext):
    ...

bar_source = SourceAsset("bar_source")

@asset
def bar(context: AssetExecutionContext, bar_source):
    ...

defs = Definitions(
    assets=[foo, bar, bar_source],
)

Before this PR:

Screenshot 2024-03-04 at 4 13 38 PM

After this PR:

Screenshot 2024-03-04 at 4 14 14 PM

I'm not really sure why we do this since both foo_source and bar_source are AFAICT treated otherwise identically, but maybe we stick some special metadata on the auto-created stub defs to keep rendering the same?

Base automatically changed from sean/external-assets-hoist-input-key-resolution-to-repo-builder to master March 4, 2024 21:25
@schrockn
Copy link
Member

schrockn commented Mar 5, 2024

@smackesey on reflection I think this actually a fairly important product decision.

It gives us the opportunity to allow people to model external assets without even forcing them to explicitly learn the concept upfront. It will just appear in the UI. However we need to be deliberate about that decision as there are arguments both ways.

We should either

  1. Do nothing
  2. Add metadata to retain the old behavior
  3. Propose this change to (esp Sandy and Josh) in an internal discussion

@smackesey
Copy link
Collaborator Author

So after some further investigation, the rendering change is smaller than I thought. The above difference I showed is only observed when looking at the default asset group. There is no change when looking at an asset job, which looks like this both before and after the change:

image

I believe the status quo style is probably not the result of a principled decision. Basically, the same graph rendering code is used for op-jobs, asset-jobs, and asset groups. The "small" style is used to show that a job is loading from source assets that aren't defined in the job, or an asset group has dependencies outside the group. The reason we see a difference when viewing the default group is that, in the status quo, the group_name of the ExternalAssetNode representing bare dependencies is null, whereas after this change it is default.

Further, I don't think the group name was ever intended to be set to null for bare dependencies, because we explicitly set it when defining the ExternalAssetNode for them-- it just happens to always be None because of the convoluted logic of `get_external_asset_nodes_from_defs:

    asset_keys_without_definitions = all_upstream_asset_keys.difference(assets_defs_by_key.keys())

    asset_nodes = [
        ExternalAssetNode(
            asset_key=asset_key,
            dependencies=list(deps[asset_key].values()),
            depended_by=list(dep_by[asset_key].values()),
            execution_type=AssetExecutionType.UNEXECUTABLE,
            job_names=[],
            group_name=group_name_by_asset_key.get(asset_key),  ## HERE
            code_version=code_version_by_asset_key.get(asset_key),
        )
        for asset_key in asset_keys_without_definitions
    ]

In any case, I updated the PR to keep the group_name of a bare dep as null until we make the above decision. I hope that allows us to proceed with this PR w/out being blocked on that decision. (Bare deps were already represented as ExternalAssetNode, the initially observed rendering change here was just from a slight change to the representation, i.e. having group name become "default").

FWIW I favor option (3), i.e. feel that bare deps should be treated just like explicitly defined SourceAsset. I had some discussion on it with @OwenKephart here: https://dagsterlabs.slack.com/archives/C03A0D72A6T/p1709238575807419

@smackesey smackesey force-pushed the sean/external-assets-repo-create-assets-for-undefined branch 2 times, most recently from edfd67f to d455262 Compare March 5, 2024 14:27
Copy link

github-actions bot commented Mar 5, 2024

Deploy preview for dagster-docs ready!

Preview available at https://dagster-docs-nad859cn1-elementl.vercel.app
https://sean-external-assets-repo-create-assets-for-undefined.dagster.dagster-docs.io

Direct link to changed pages:

@smackesey smackesey force-pushed the sean/external-assets-repo-create-assets-for-undefined branch from d455262 to 5a2551a Compare March 5, 2024 14:35
Copy link
Member

@schrockn schrockn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create linear task for eliminating SYSTEM_METADATA_KEY_AUTO_CREATED_STUB_ASSET and link to the issue in the places where we will want to delete when we come to a resolution on what to do about these bare references?

@smackesey smackesey force-pushed the sean/external-assets-repo-create-assets-for-undefined branch from 5a2551a to 6eadaed Compare March 5, 2024 18:12
@smackesey smackesey merged commit db6db44 into master Mar 5, 2024
1 check was pending
@smackesey smackesey deleted the sean/external-assets-repo-create-assets-for-undefined branch March 5, 2024 18:32
PedramNavid pushed a commit that referenced this pull request Mar 28, 2024
… repo construction (#20145)

## Summary & Motivation

Currently it is possible for assets to be referenced as dependencies
when they have no corresponding `AssetsDefinition` (only when using
`deps`). This complicates implementation of the `AssetGraph` and
`AssetLayer`.

This PR generates unexecutable external assets during repository
construction (the same place source assets are converted to external
assets) for any referenced keys without a corresponding user-provided
definition.

## How I Tested These Changes

Existing test suite. One test was removed-- this made sure the internal
asset graph could handle undefined asset references, but we no longer
rely on this behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants